Similarity and Dissimilarity Methods for Processing Chemical Structure Databases
نویسندگان
چکیده
This paper reviews measures of similarity and dissimilarity between pairs of chemical molecules and the use of such measures for processing chemical databases. The applications discussed include similarity searching, database clustering and diversity analysis, focusing upon measures that are based on fragment bit-string occurrence data. The paper then discusses recent work on the calculation of similarity by aligning molecular fields and on the selection of structurally diverse subsets of chemical databases.
منابع مشابه
Composite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملSimilarity-based data mining in files of two-dimensional chemical structures using fingerprint measures of molecular resemblance
This paper reviews the use of measures of inter-molecular similarity for processing databases of chemical structures, which play an important role in the discovery of new drugs by the pharmaceutical industry. The similarity measures considered here are based on the use of a fingerprint representation of molecular structure, where a fingerprint is a vector encoding the presence of fragment subst...
متن کاملComparison of Similarity Coefficients for Clustering and Compound Selection
Recent studies into the use of a selection of similarity coefficients, when applied to searches of chemical databases represented by binary fingerprints, have shown considerable variation in their retrieval performance and in the sets of compounds being retrieved. The main factor influencing performance is the density distribution of the bitstrings for the active class, a feature which is close...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملOn the similarity metric and the distance metric
Similarity and dissimilarity measures are widely used in many research areas and applications. When a dissimilarity measure is used, it is normally required to be a distance metric. However, when a similarity measure is used, there is usually no formal requirement. In this talk, we will present the following results. We first present a formal definition of similarity metric. We then show the re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Comput. J.
دوره 41 شماره
صفحات -
تاریخ انتشار 1998